166 PART 4 Comparing Groups

that follows the chi-square distribution (also covered in Chapter 24). So the test

statistic from this test should follow the chi-square distribution. Now it is obvious

why it is named the chi-square test! The next step is to obtain the p value for the

test statistic. To do that manually, you would look up the test statistic (which is

8.81 in our case) in a chi-square table.

In actuality, the chi-square distribution refers to a family of distributions. Which

chi-square distribution you are using depends upon a number called the degrees of

freedom, abbreviated d.f. or df or by the Greek lowercase letter nu (v) (in this book

we use df). The df is a measure of the probability of independence between the value

of the predictor (row) variable and value of the column (outcome) variable.

How would you calculate the df for a chi-square test? The answer is it depends on

the number of rows in the cross-tab. For the 2

2 cross-tab (fourfold table) in this

example, you added up the four values in Figure 12-5, so you may think that you

should look up the 8.81 chi-square value with 4 df. But you’d be wrong. Note the

italicized word independence in the preceding paragraph. And keep in mind that

the differences (Ob

Ex

) in any row or column always add up to zero. The four

terms making up the 8.81 total aren’t independent of each other. It turns out that

the chi-square test statistic for a fourfold table has only 1 df, not 4. In general, an

N-by-M table, with N rows, M columns, and therefore N

M cells, has only

N

M

1

1 df because of the constraints on the row and column sums. In our

case, N — which is the number of rows — is 2, so N-1 is 1. Also, M — which is the

number of columns — is 2, so M-1 is 1 also (and 1 times 1 is 1). Don’t feel bad if

this wrinkle caught you by surprise  — even Karl Pearson who invented the

chi-square test got that part wrong!

So, if you were to manually look up the chi-square test statistic of 8.81  in a

chi-square table, you would have to look under the distribution for 1 df to find out

the p value. Alternatively, if you got this far and you wanted to use the statistical

software R to look up the p value, you would use the following code: pchisq(8.81, 1,

lower.tail = FALSE). Either way, the p value for chi-square = 8.81, with 1 df, is 0.003.

This means that there’s only a 0.003 probability that random fluctuations could

produce the effect seen, where CBD performs so differently than NSAIDs with

respect to pain relief in chronic arthritis patients. A 0.003 probability is the same

as 1 chance in 333 (because 1 0 003

333

/ .

), meaning very unlikely, but not impos-

sible. So, if you set α = 0.05, because 0.003 < 0.05, your conclusion would be that

in the chronic arthritis patients in our sample, whether the participant took CBD

or NSAIDs was statistically significantly associated with whether or not they felt

pain relief.